Skip to content

Comments

Allow SFS to reuse iterable CV splits (swev-id: scikit-learn__scikit-learn-25973)#59

Open
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-25973from
fix-sfs-cv-generator-25973
Open

Allow SFS to reuse iterable CV splits (swev-id: scikit-learn__scikit-learn-25973)#59
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-25973from
fix-sfs-cv-generator-25973

Conversation

@casey-brooks
Copy link

Summary

  • normalize SequentialFeatureSelector CV handling with check_cv so iterable/generator inputs are reused safely
  • document that iterables are materialized once and extend tests to cover generator and integer CV inputs

Resolves #55.

Reproduction (pre-fix)

from sklearn.datasets import make_classification
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import LeaveOneGroupOut
import numpy as np

X, y = make_classification(random_state=0)

groups = np.zeros_like(y, dtype=int)
groups[y.size//2:] = 1

cv = LeaveOneGroupOut()
splits = cv.split(X, y, groups=groups)

clf = KNeighborsClassifier(n_neighbors=5)
seq = SequentialFeatureSelector(clf, n_features_to_select=5, scoring='accuracy', cv=splits)
seq.fit(X, y)
IndexError: list index out of range

Testing

  • `LD_LIBRARY_PATH=/nix/store/wffgswxkp55xi14jy63rjsnfvl2qvmxy-gcc-14.3.0-lib/lib:/nix/store/4wdz42ns29ys6fm1xak68bnp51nxhd2s-zlib-1.3.1/lib:/nix/store/gh2dd8vimringn726ndall19gbm77prj-openblas-0.3.30/lib:/nix/store/y6zrr7dfg051mz4dpjvaldy4g9cy6wmq-lapack-3/lib nix shell nixpkgs#python310 nixpkgs#python310Packages.pip --command python3.10 - <<'PY'
    import importlib.machinery
    import importlib.util
    import sys
    from pathlib import Path

import pytest
import sklearn
import sklearn.feature_selection

name = "sklearn.feature_selection._sequential"
path = Path("/workspace/scikit-learn/sklearn/feature_selection/_sequential.py")
loader = importlib.machinery.SourceFileLoader(name, str(path))
spec = importlib.util.spec_from_loader(name, loader)
module = importlib.util.module_from_spec(spec)
loader.exec_module(module)
sys.modules[name] = module
sklearn.feature_selection._sequential = module
sklearn.feature_selection.SequentialFeatureSelector = module.SequentialFeatureSelector

sys.exit(
pytest.main(
[
"--import-mode=importlib",
"scikit-learn/sklearn/feature_selection/tests/test_sequential.py::test_sfs_supports_iterable_cv_generator",
"scikit-learn/sklearn/feature_selection/tests/test_sequential.py::test_sfs_baseline_cv_int_runs",
]
)
)
PY`

  • nix shell nixpkgs#python310 nixpkgs#python310Packages.pip --command flake8 sklearn/feature_selection/_sequential.py sklearn/feature_selection/tests/test_sequential.py

@casey-brooks casey-brooks requested a review from a team December 26, 2025 13:40
@casey-brooks
Copy link
Author

Local Validation

  • `LD_LIBRARY_PATH=/nix/store/wffgswxkp55xi14jy63rjsnfvl2qvmxy-gcc-14.3.0-lib/lib:/nix/store/4wdz42ns29ys6fm1xak68bnp51nxhd2s-zlib-1.3.1/lib:/nix/store/gh2dd8vimringn726ndall19gbm77prj-openblas-0.3.30/lib:/nix/store/y6zrr7dfg051mz4dpjvaldy4g9cy6wmq-lapack-3/lib nix shell nixpkgs#python310 nixpkgs#python310Packages.pip --command python3.10 - <<'PY'
    import importlib.machinery
    import importlib.util
    import sys
    from pathlib import Path

import pytest
import sklearn
import sklearn.feature_selection

name = "sklearn.feature_selection._sequential"
path = Path("/workspace/scikit-learn/sklearn/feature_selection/_sequential.py")
loader = importlib.machinery.SourceFileLoader(name, str(path))
spec = importlib.util.spec_from_loader(name, loader)
module = importlib.util.module_from_spec(spec)
loader.exec_module(module)
sys.modules[name] = module
sklearn.feature_selection._sequential = module
sklearn.feature_selection.SequentialFeatureSelector = module.SequentialFeatureSelector

sys.exit(
pytest.main(
[
"--import-mode=importlib",
"scikit-learn/sklearn/feature_selection/tests/test_sequential.py::test_sfs_supports_iterable_cv_generator",
"scikit-learn/sklearn/feature_selection/tests/test_sequential.py::test_sfs_baseline_cv_int_runs",
]
)
)
PY`

  • 2 passed
  • nix shell nixpkgs#python310 nixpkgs#python310Packages.pip --command flake8 sklearn/feature_selection/_sequential.py sklearn/feature_selection/tests/test_sequential.py
    • no issues

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:\n- Thanks for correcting the exemplar refinement indexing.\n\nBlocking issues:\n- scikits/learn/cluster/affinity_propagation_.py:140 – Please add a regression test covering this path so the bug cannot silently return.

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:\n- now normalizes via and threads the reusable object through scoring, preventing iterable/generator exhaustion.\n- Updated the parameter docs to describe iterable handling and associated memory trade-offs.\n- Added regression coverage for generator-based CV splits plus a baseline int-valued CV test.\n\nLooks good. Thanks for addressing the iterable CV exhaustion bug in SFS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants